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We propose a description for the quasi-equilibrium self-assembly of small, single-stranded (ss) RNA viruses 
whose capsid proteins (CPs) have flexible, positively charged, disordered tails that associate with the negatively 
charged RNA genome molecules. We describe the assembly of such viruses as the interplay between two 
coupled phase-transition like events: the formation of the protein shell (the capsid) by CPs and the condensation 
of a large ss viral RNA molecule. Electrostatic repulsion between the CPs competes with attractive hydrophobic 
interactions and attractive interaction between neutralized RNA segments mediated by the tail-groups. An 
assembly diagram is derived in terms of the strength of attractive interactions between CPs and between CPs 
and the RNA molecules. It is compared with the results of recent studies of viral assembly. We demonstrate that 
the conventional theory of self-assembly, which does describe the assembly of empty capsids, is in general not 
applicable to the self-assembly of RNA-encapsidating virions. 

PACS numbers: 87.I5.bk, 87.15.kj 


I. INTRODUCTION 
A. Self-assembly of small RNA viruses 

Assembly is a key part of the life cycle of a virus. During 
assembly, structural proteins and genome molecules produced 
inside an infected cell combine to form virus particles (“viri¬ 
ons”). Remarkably, many small viruses with a single-stranded 
(ss) RNA genome (“vRNA”) will assemble under laboratory 
conditions in solutions that contain the protein and genome 
molecular components of the virus |[I]|2l. Figure[2shows a re¬ 
construction 0 of the Flock House Virus (FHV), an example 
of a small ssRNA virus Q. The icosahedral shell, or “cap¬ 
sid”, has an inner radius Rc of about 10 nm and a thickness 
of about 3 nm. It is composed of 180 identical proteins (CPs). 
In the Caspar-Klug classihcation of viral capsids, icosahedral 
shells composed of 180 subunits are known as “T = 3” shells. 

The genome of a T = 3 virus encodes minimally two pro¬ 
teins: the capsid protein and an RNA-dependent RNA poly¬ 
merase, together about 4,000 bases. A 10 nm radius spherical 
volume can accommodate about 5,000 RNA bases in the form 
of a (hydrated) crystal of duplex RNA. The density of the min¬ 
imal RNA genome is thus not far below that of the hydrated 
RNA crystal form (in some cases the density of the packaged 
RNA material even exceeds that of the crystal form fS] |6l). 
The dimensions of ss RNA genome molecules in solution are 
hard to measure, but the combined evidence from small-angle 
x-ray scattering, cryo-EM, and fluorescence measurements in¬ 
dicates that vRNA molecules are swollen in physiological so¬ 
lutions. For example, the hydrodynamic radius of the vRNA 
molecules of the MS2 virus has been estimated to be about 
14 nm Q whereas Rc is about 11 nm for MS2. Genome en- 
capsidation thus requires a significant level of compression of 
the vRNA molecules a. 



FIG. I. (Color online) X-ray reconstruction of a cross-section of 
a T = 3 virus-like particle (from ref. (SI). The capsid is com¬ 
posed of 180 identical copies of Flock House Virus capsid proteins 
(“wild-type” or wt) arranged in an icosahedral shell (outer layer). 
The encapsidated RNA material is non-viral, so there are no specific 
protein-RNA interactions. Only the part of the RNA material that 
has icosahedral symmetry is shown. The radius of the condensed 
RNA globule, indicated by an arrow, is about 10 nm. The stars indi¬ 
cate two-fold close contacts between the enclosed RNA globule and 
the capsid. The image is reproduced, with permission, from ref. 0 
[copyright (2004) American Society of Microbiology.]. 


Viral self-assembly is driven by the competition between 
repulsive and attractive macromolecular interactions. It is 
well known that specific affinities - which involve stem-loop 
and tRNA-like motifs of the native vRNA molecules that bind 
preferentially to the viral CPs in question - significantly speed 
up assembly kinetics and allow for assembly at lower concen¬ 
trations of the components. Nevertheless, self-assembly stud- 
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ies of CPs with non-native RNA molecules (e.g., ii) indicate 
that generic interactions are in general capable of packaging 
ssRNA molecules in the absence of specific CP-RNA affini¬ 
ties, albeit with reduced yield. For example, the virus-like 
particle shown in Fig. [T] packages non-native ssRNA mate¬ 
rial. This article will focus exclusively on viral self-assembly 
driven by generic interactions. 

Among the generic interactions, electrostatics plays a cen¬ 
tral role. The CPs ofT — i ssRNA viruses typically - but not 
always - have a negatively charged “head group” and a pos¬ 
itively charged “tail group” (see Fig. |^. The pH-dependent 
negative charge —eZh of the head group is located mostly on 
the part of the CP that faces the capsid exterior, while the pH- 
independent positive charge +eZt of the tail group faces the 
capsid interior. Typically, Zh ^ Zt ^ 10. The net charge of 
the CPs of the cowpea chlorotic mottle virus (CCMV) - a self¬ 
assembling T = 3 ssRNA virus whose assembly process has 
been particularly well-studied - is negative under physiologi¬ 
cal conditions but if the pH is reduced then the sign changes 
around pH ~ 3.6 Qo), the isoelectric point HD. The charge 
distribution of CPs that are part of a capsid has a dipolar com¬ 
ponent that remains very large - of the order of 10^ Debye - 
even at the isoelectric point. If the characteristic energy scale 
of the electrostatic repulsion between CPs in a 0.1M salt en¬ 
vironment is estimated by Debye-Hiickel (DH) theory, then 
values in the range of 10k bT or more are found. 

The positively charged CP tail groups have an electrostatic 
affinity for the negatively charged RNA nucleotides. Evidence 
is provided by the fact that the strength of the affinity varies in¬ 
versely with the ionic strength of the solution CD. Measured 
dissociation constants ini for CP/RNA association binding 
give binding energies in the range of 15k bT. It should be 
noted that the CP/vRNA binding affinity can have important 
contributions coming from correlation effects m. Numeri¬ 
cal simulations ifTSlfTbl report that the electrostatic affinity in¬ 
volves counterion release. The importance of CP-RNA elec¬ 
trostatic interactions is manifested by the fact that the amount 
of vRNA that is packaged by a small ssRNA virus is a linear 
function of the net positive tail charge d. 

Importantly, neutralization of the ssRNA material by the 
positive CP tail charges is incomplete: the interior of a CCMV 
virion has a large residual macroion charge in the range of 
— lO^e. This disparity between the CP tail charge and the 
vRNA charge is a form of “overcharging”, a fundamental is¬ 
sue in the theory of the electrostatics of of macroions GSl- 
f22\ . Deviations from macroion charge neutrality in aqueous 
solutions are attributed to constraints and/or correlations that 
prevent matching of the positive and negative charges. In the 
context of viral assembly, overcharging was attributed by Hu 
et al. M to the structure of the CP tail group/RNA associa¬ 
tion and to Manning condensation by Belyi et al. uni- 

The repulsive electrostatic interactions between CPs, which 
inhibit capsid assembly, compete with highly directional 
CP-CP “pairing” attraction ll23l . This attraction is pro¬ 
vided by a combination of attraction between complemen¬ 
tary hydrophobic patches across CP-CP interfaces and pH- 
dependent proton-mediated pairing interactions between car- 
boxylate groups on residues of adjacent CPs facing each other 



FIG. 2. (Color online) Schematic cross-section of a small ssRNA vi¬ 
ral capsid. The positively charged tail groups of the capsid proteins 
extend inward where they can associate with sections of the nega¬ 
tively charged branched RNA molecules (not shown). The angle tp 
is the relative angle between the normals of adjacent capsid proteins. 
For an inner radius R of about 10 nm and a characteristic capsid pro¬ 
tein (CP) dimension D of about 3 nm pi ~ D/R ~ 0.3 radians. The 
figure is roughly to scale for aT — 3 virus. 

(“Caspar pairing’ ’ El). For capsid assembly to take place, 
the strength of the attractive interactions between CPs must 
exceed that of repulsive electrostatic interactions, so it should 
also be in the range of lOkBT. The competition between the 
attractive pairing interactions with the salt and pH-dependent 
electrostatic repulsion is illustrated by assembly diagrams of 
aqueous solutions of CCMV CPs (but no RNA) with the 
pH and ionic strength levels as the thermodynamic variables 
ESI ED. Under conditions of high ionic strength and reduced 
pH - which means increased attraction - empty capsids as¬ 
semble spontaneously. This does not happen under conditions 
of neutral acidity when the negative charge of the CP head 
groups apparently is just large enough to overcome the attrac¬ 
tive interactions. 

The assembly of RNA-containing virions is normally de¬ 
scribed in terms of kinetic pathways (e.g. ED). Numerical 
studies of simple models for virion assembly 1281 show two 
distinct pathways. If the net CP-CP attraction is large com¬ 
pared to the CP-RNA binding energy then assembly proceeds 
via a classical nucleation-and-growth pathway. If the net CP- 
CP attraction is small compared to the CP-RNA binding en¬ 
ergy then assembly proceeds via a form of collective conden¬ 
sation (“en masse”). 


B. Equilibrium self-assembly of virions 

This article was motivated by a recent series of self- 
assembly experiments of CCMV virions carried out under a 
protocol that maintained, as closely as possible, thermody¬ 
namic equilibrium during assembly (e.g. ||9|). Equilibrium 
thermodynamics has already been extensively applied to the 
self-assembly of empty capsids (see also Supplemental 
Material, Sec. IV OOl ). According to equilibrium thermo¬ 
dynamics, the onset of capsid assembly as a function of the 
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solution concentrations of the molecular components has the 
character of a phase transition in the limit that the number 
of molecular components per aggregate is large compared to 
one. The critical CP concentration for this transition, some¬ 
times called the “critical micelle concentration (CMC)” ED 
by analogy with the self-assembly of micelles, is determined 
by the condition that the chemical potential of a CP in solu¬ 
tion is the same as that of a CP that is part of a capsid. The 
predictions of equilibrium thermodynamics agree well with 
chromatography studies of the self-assembly of empty cap¬ 
sids of CCMV 1291 and other other viruses. The characteristic 
energy scale for the CP-CP interactions in a CCMV capsid at 
neutral pH was in the range of just a few fc^T, indicating that 
the repulsive electrostatic interactions between CPs indeed are 
closely balanced by the attractive interactions. 

Under conditions of neutral pH and physiological salt con¬ 
centrations, empty CCMV capsids do not form in solutions of 
CCMV CPs but addition of vRNA molecules leads to virion 
assembly when the pH is reduced i). This stabilization of as¬ 
sembly by the vRNA molecules would seem to be obvious on 
the basis of straightforward electrostatic considerations; if the 
positively charged tail groups of the CPs are neutralized by the 
negatively charged vRNA molecules then this should reduce 
the electrostatic repulsion with respect to the hydrophobic at¬ 
traction, and hence trigger assembly. For CCMV at least, this 
argument is invalid. CCMV CPs whose tails have been re¬ 
moved do not assemble under conditions of neutral pH fy2\ . 
If neutralization of the tail groups was a sufficient condition 
for assembly then this should have happened. Next, we al¬ 
ready saw that CCMV CPs have a dipolar charge distribution 
and association of the tail group of a CCMV CP with an ss- 
RNA molecule actually increases the total negative charge of 
the assembly (since exceeds Zt at neutral pH) and this 
strengthens electrostatics repulsion. At the isoelectric point, 
where Zh = Zt, two CPs can crudely be treated as oriented 
electrostatic dipoles. In that case, neutralization of the posi¬ 
tive charges of the dipoles still increases the net repulsion be¬ 
tween two CPs for larger separations. An additional source of 
attractive interactions clearly is required for vRNA-triggered 
self-assembly of CCMV virions. This additional source of 
attraction will be assumed to be the condensation of vRNA 
molecules induced by the CP tail groups, as we will now dis¬ 
cuss. 


ble then their dsDNA counterparts. They have a disordered, 
spherical appearance under TEM llTSl |36l and they tend to 
aggregate together. Which of these two different modes of 
condensation prevails is believed to be determined by the per¬ 
sistence length. The persistence length of ssDNA chains is 
roughly a factor 50 smaller than that of ds chains. Numerical 
simulations of linear homopolymers with self-attraction report 
that, with increasing persistence length, a structural transfor¬ 
mation takes place in the morphology of the condensates from 
a disordered, spherical globule to an ordered toroidal conden¬ 
sate ElllMl. 

Disordered spherical condensates appear in solutions of 
flexible, charged polymers (“polyelectrolytes”) to which poly¬ 
valent ions have been added as condensing agents. In poly¬ 
mer physics the appearance of such condensates are viewed 
as an example of the “coil-to-globule” transition ll39l . In this 
article, we will assume that swollen ss vRNA molecules in 
solution condense via a coil-to-globule transition when con¬ 
densing agents are added and that CPs in general, and the tail- 
groups in particular, act as the vRNA condensing agents. 


D. Equilibrium assembly diagram 

Based on the model discussed in the next sections, a 
schematic equilibrium assembly diagram is obtained shown in 
Fig|^ The vertical axis e is the binding affinity between a CP 
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C. Condensation of single-stranded nucleotide chains 

It is well known that double-stranded (ds) A phage B-DNA 
molecules in aqueous saline solutions condense into rodlike 
and toroidal aggregates when low concentrations of condens¬ 
ing agents are added to the solution ifTSl 1^ . The condens¬ 
ing agents - which can be neutral or positively charged poly¬ 
valent ions - generate an effective short-range attraction be¬ 
tween ds DNA molecules ll34l . When low concentrations 
of poly-L-Lysine are added to solutions containing plasmid 
length single-stranded DNA molecules then condensation is 
observed as well ESI ESI- These condensates are in fact, un¬ 
der the same conditions, significantly smaller and more sta- 


FIG. 3. (Color online) Equilibrium assembly diagram of a small 
RNA virus. The vertical axis e is the binding energy of a capsid 
protein (CP) to the viral vRNA molecule. The horizontal axis u is 
the strength of attractive CP/CP pairing interactions. Progressive pH 
reduction at fixed salinity roughly corresponds to a horizontal path in 
this diagram. Schematic boundaries between the different regimes, 
indicated in light gray, are a guide to the eye only. 

and a vRNA molecule - including correlation effects M - 
while the horizontal axis u is the strength of attractive CP/CP 
pairing interactions. If both of these parameters are small 
compared to ksT, then the vRNA molecules are swollen and 
most CPs are free in solution (not shown). Increasing e in¬ 
creases the number of CPs associated with a vRNA molecule. 




4 


which effectively reduces the solvent quality. At a threshold 
e((/)cp) - which depends on the total CP concentration (j)cp - 
the solution disproportionates into condensed CP-rich “sat¬ 
urated aggregates” and swollen CP-poor vRNA molecules. 
Disproportionation is similar to phase separation but without 
the appearance of phase boundaries ll22l l40l . Instead, the so¬ 
lution is a uniform mixture of two different populations of 
aggregate species in thermal equilibrium with each other, as 
further discussed in Sect. IID. In the present case, one species 
is composed of swollen vRNA molecules with a small num¬ 
ber of associated CPs while the other species is composed of 
aggregates of condensed vRNA molecules surrounded by a 
layer of headgroups in a liquidlike state (the provirionl state 
in Fig0. The CPs are forced out of the interior of a con¬ 
densed vRNA globule by a combination of surface tension 
of the globule and angle-dependent pairing attraction between 
the CPs. 



after solidification 


Provirion 1 Provirion 2 


FIG. 4. (Color online) Provirion 1 and 2 states before (top) and after 
(bottom) solidification. The number of capsid proteins (CPs) in the 
first layer of the provirion 1 state whose tails are strongly associated 
with the RNA molecule(s) is comparable to that of the virion. The 
interior is negatively charged. The tails of the excess CPs in the 
second layer have a much weaker association with RNA, either due 
to the interaction of their tails with the outer side of the first CP layer, 
or because their tails are forced to squeeze in between the CPs of the 
first layer to access the RNA. In the provirion2 state, the number 
of CPs that are strongly associated with RNA is larger than that of a 
virion. The tail groups fully neutralize the RNA molecule(s). 

In the provirion 1 state, the condensed, spherical vRNA 
molecule is surrounded by (roughly) 180 CPs whose tail- 
groups are associated with the vRNA molecule. The tail 
groups do not neutralize the vRNA molecule so the interior 
has a net negative macroion charge. Excess CPs either are 
free in solution or physisorbed on the surface of the provirion. 
The two populations of bound and free CPs are in thermal 
equilibrium with each other. If the strength u of the pair¬ 
ing attraction between the CPs is increased then the CPs of 


the inner layer of the provirion 1 state crystallize out into a 
T = 3 shell of exactly 180 CPs (see Fig.4). If e is increased 
for fixed (small) u, then a second transition is encountered be¬ 
yond which the CP tail groups neutralize the vRNA molecule. 
This is the provirion 2 state, which has a significantly larger 
surface area than the provirion 1 state but a similar volume. It 
has zero surface tension and is subject to shape fluctuations. 
Solidification starting from the provirion 2 state is expected 
to lead to “malformed shells” composed of more than 180 
CPs (see Fig.4). The existence of a provirion 2 state is one 
of the central predictions of the theory. Provirion 2 aggregates 
would be a novel application area for the physics of strongly 
fluctuating interfaces ED, developed originally for surfaces 
and interfaces composed of amphiphilic molecules. However, 
the action of the CPs is not due to the competition between the 
hydrophobic and hydrophilic parts of an amphiphilic molecule 
but due to the affinity of the positively charged tail-groups for 
the interior of the condensed vRNA molecules and of the neg¬ 
atively charged head-groups for the exterior. If provirion 2 
particles are indeed found then one might say that CPs act as 
“amphielectrics”. 

An essential claim of the proposed model is that virion as¬ 
sembly for larger values of £ does not follow the conventional 
theory of self-assembly ED and the Law of Mass Action 
(see also Supplemental Material, Sect. IV [30]) of equilib¬ 
rium chemical thermodynamics. The key point of the model 
is that virion assembly will take place from a pre-condensed 
CP/vRNA aggregate for sufficiently large e. The CP concen¬ 
tration inside such an aggregate can be very high, even when 
the CP solution concentration is very low, provided that e is 
large enough to off-set the low CP solution chemical poten¬ 
tial and allow aggregate formation. The final assembly of the 
virion from this pre-condensed state for increasing u is then 
independent of the solution CP concentration. In essence, the 
aggregate acts as a chemical reactor that concentrates the com¬ 
ponents. 

If the strength u of the attractive interactions is increased 
for e < e((/)cp), then there is no disproportionation. In¬ 
stead, virion assembly takes place directly from the swollen 
coil phase and follows the conventional Law of Mass Action 
scenario of empty capsid assembly . We speculate - but have 
not shown - that in terms of assembly kinetics, a direct transi¬ 
tion from the coil phase to the virion phase for lower e obeys 
the nucleation and growth scenario. In contrast, virion assem¬ 
bly starting from the provirion 1 and 2 states - so for larger e 
- is expected to proceed via some form of the “en masse” ki¬ 
netic scenario. An interesting aspect of the phase diagram of 
Fig|^is the fact that the equilibrium phase diagram includes 
the provirion I structure as a stable structure. If an assem¬ 
bly experiment would produce a structure like the provorion 
I, then this would normally be interpreted as a “kinetic trap”. 
That does not mean that there are no kinetically trapped states 
in the proposed model. If an equilibrium phase diagram in¬ 
cludes multiple competing structures separated by first-order 
transition lines - as is the case for the proposed model - then 
this only enhances kinetic trapping. The equilibrium assem¬ 
bly of virions requires in general a very fine balance between 
competing nonspecific interactions. 
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In Sect. II, we present the simplest version of the model 
that describes the condensation of vRNA molecules as a coil- 
to-globule transition induced by CPs. Section III extends the 
model to include capsid formation deeper in the condensed 
phase. In the concluding Sect. IV we compare the model 
with the recent experiments on the equilibrium assembly of 
CCMV, and discuss experiments that would help to verify (or 
disprove) the model. We conclude with a discussion of the 
limitations of the model and how it could be extended further. 
For the convenience of the reader, a table of symbols used in 
the paper is provided in the Supplemental Material [30]. 


II. COIL-TO-GLOBULE TRANSITION 

In its simplest form, the model is a variational free energy 
for a homogeneous CP/RNA aggregate in terms of the radius 
of gyration R of the aggregate, the maximum ladder distance 
S (or MLD), defined as the maximum number of comple¬ 
mentary paired nucleotides separating two points of the RNA 
molecules El, and the segment occupation probability x. 
The latter is defined as the probability that a segment of the 
vRNA molecule is associated with the tail group of a CP. The 
variational free energy F{R^ S, x) is defined as 

r 2 g2 J^2 

— NxPe + A' [a:Ins + (1 — a;) ln(l — s)] + PFpb^R) 

(II. 1) 

with P = l/ksT. The different terms will be explained in 
sequence. 



FIG. 5. (Color online) Different realizations of branched tree struc¬ 
tures composed of A=21 segments. The first structure has a maxi¬ 
mum ladder distance S=6 while S'=l 1 for the second structures. Both 
are indicated in red. 

dius of gyration R. The second term is the conformational en- 
tropic free energy of an A-segment branched polymer whose 
MLD equals S. The third and fourth terms represent the inter¬ 
actions between the segments expressed in the form of a virial 
expansion in powers of the segment density N/Rp. The coef¬ 
ficient V of the second-order term, which has the dimension 
of volume, is typically of the order of It can be positive 
(“good solvent”) or negative (“bad solvent”). The coefficient 
W of the third-order term - which represents the strength of 
three-body interactions - must be positive to ensure thermo¬ 
dynamic stability. It is typically of the order of l^. Minimiza¬ 
tion of the sum of the first four terms with respect to S and 
R leads to a smooth coil-to-globule condensation transition 
around A = 0. In the good solvent case, the radius of gyration 
scales with the number of segments A as R{N) cx 
which means that the swollen, or coil, state has a fractal ge¬ 
ometry. In the condensed phase, with negative V, the globule 
size scales as a compact object with i?(A) cx A^/^. In ei¬ 
ther case, the MLD is determined by the radius of gyration 
through S{R) ~ (A(i?/Z)^)^/^. It follows that condensation 
decreases the MLD, thus increasing the amount of branching. 


A. Coil-to-globule transition of annealed branched polymers 

The first four terms constitute together the variational free 
energy of an annealed branched homopolymer in the Flory ap¬ 
proximation MM- The branched homopolymer representa¬ 
tion for vRNA molecules was developed in ref where the 
secondary structure of vRNA molecules was approximated as 
a collection of A identical rigid segments of length I con¬ 
nected by freely-jointed triple junctions into a tree-like struc¬ 
ture. Analysis of RNA secondary structures El indicates that 
a reasonable choice for I is about six nucleotides. For a 4,000 
base vRNA molecule, the number of segments A is then in 
the range of 10^ (more details are provided in Supplemental 
Material, Sec. I [30]). The different possible configurations 
of the branched polymer represents the different possible sec¬ 
ondary structures. Numerical evaluation of the enthalpy of 
vRNA molecules shows that there is a very large number of 
secondary structures with enthalpy within kgT of the ground- 
state El . Two examples of tree-like structures with the same 
number of segments (A=21) - but different MLDs (6, respec¬ 
tively, 11)- - are shown in Fig|^ 

Returning to Eq. the first term is the entropic elastic 
free energy of a linear homopolymer of S segments with a ra- 


B. Mixing entropy 


An important feature of the model is the fact that the second 
virial coefficient V(x} of the branched polymer depends on 
the probability x that a segment is associated with a CR In the 
model, precisely one segment can associate with the tail of one 
CP so the maximum number of CPs that can associate with 
the branched polymer equals A. We will call such an a; = 1 
aggregate a “saturated aggregate”. Because CP-free vRNA 
molecules are known to be swollen under conditions of neutral 
pH and physiological salt concentrations, Vq = V {x = Q) 
will be assumed to be positive. On the other hand, in order 
for the CPs to act as condensing agents for vRNA molecules, 
Vi = V{x = f) should be negative. The model assumes a 
linear interpolation V{x) = Vq — x{Vo — Vi) between these 
two limits (the bare second virial coefficient Vq of a branched 
flexible polyeletrolyte is roughly estimated as l^. The second 
virial coefficient Vi of a saturated aggregate, a more complex 
quantity, is discussed and estimated in Supplemental Material 
Sect. II [30]). 


Returning to Eq.dl.l i, the fifth term represents the bind¬ 
ing affinity of a CP with a segment while the sixth term is 
the entropy of distributing Nx different CPs over A different 
segments. 
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C. Electrostatic free energy 


The last term Fpb of the variational free energy is the 
electrostatic free energy of the aggregate as obtained from 
Poisson-Boltzmann (PB) theory (see Supplemental Material, 
Sec. Ill [30] for a discussion of PB theory in the context of 
the model). The macroion charge distribution is assumed to 
be as follows. The tail group is assigned a charge eZ and 
the head group a charge —eZ, with Z ^ 10, while the seg¬ 
ments of the branched polymer are assigned a negative charge 
of —eZ, so one tail group can neutralize one segment. The to¬ 
tal macroion charge of the branched polyelectrolyte molecule 
equals —NZ independent of the number of associated CPs. 
These macroions are placed in a monovalent salt solution with 
ion concentration 2cs. In PB theory, the electrostatic free en¬ 
ergy of a macroion is determined by the charging parame¬ 
ter a, which is defined as the ratio of the effective macroion 
charge Q* contained in a certain volume 17 over the num¬ 
ber 2csl7 of monovalent salt ions in that same volume in 
the absence of the macroion charge. The effective macroion 
charge differs from the bare charge because the monovalent 
salt ions can condense onto the macroion and thereby dimin¬ 
ish the effective charge. Within PB theory, the effective charge 
per unit length of a highly charged polyelectrolyte molecule 
equals —ejls, where Ib is the Bjerrum length defined by 
e^/sols = kBT with eo the dielectric constant of water. For 
the present case, the effective charge Q* of the branched poly¬ 
mer equals —{e/lB)N/I. For a vRNA molecule of 4,000 nu¬ 
cleotides confined to a sphere with a radius R of the order 
of 10nm, the charging parameter a = |(5*| /2Vl{R)cs, with 
17(i?) = (4/3)7ri?^, is of the order of one. 

The PB electrostatic free energy of capsid assembly has 
been extensively discussed (e.g., ||45] l46ll ). In the limits of 
small and large charging parameters it is given by (see Sup¬ 
plemental Material, Sect. Ill C [30]) 


^FMR) ~ -TP 


a{R) <C 1 

( 11 . 2 ) 

a{R) » 1 


Here, /{sTjkBT) is the square of the Debye 

screening parameter. Note that, in the weak-charging limit, 
Fpb{R) has the same form as the second virial term in 
Eq.dlTT]). iZl 


D. Phase diagrams 

In order to obtain the phase diagram, F{R, S, x) is first 
minimized with respect to R and S for fixed occupancy x. The 
resulting free energy F{x) has in general either one minimum 
or two minima separated by a maximum. For values of x near 
the maximum, the system is thermodynamically unstable and 
the solution decomposes into aggregates with different values 
of X. As the magnitude —Vi of the negative second virial co¬ 
efficient is reduced, then the two minima of F{x) approach 
each other and merge at a critical value 14- Define {x) to be 


the mean occupancy, i.e., the average of the microscopic vari¬ 
able X over all aggregates in solution. The mean occupancy 
is determined by the condition of phase equilibrium between 
CPs that are associated with the branched polyelectrolyte and 
those that are free in solution. Equating the chemical poten¬ 
tial p of the CPs in solution to the derivative of the 

free energy of CP that is part of an aggregate with respect to 
the number xN of CPs leads to a condition from which the 
mean occupancy {x) can be obtained by a common-tangent 
construction. 


1. Two types of disproportionation phase diagrams 

The shape of the resulting phase diagram depends crucially 
on the charging parameter. Eor the weak-charging regime 
and fie large compared to one, the phase diagram is shown 
in Eigj^ In this regime, the mean occupancy (x) can be 
equated to the macroscopic CP to RNA concentration ratio 
= 4>cp/N4>rna, normalized so AT = 1 corresponds to 
the concentration ratio of a saturated aggregate. We will as¬ 
sume that X is less than or equal to one (as is the case for the 
experiments discussed in the conclusion). 



FIG. 6. (Color online) Disproportionation of a mixture of CP proteins 
and RNA molecules for large binding affinities e (weak-charging 
regime). The horizontal axis is the CP to vRNA concentration ra¬ 
tio X. The vertical axis —Vi is minus the effective second virial 
coefficient of a saturated globule. If —Vi exceeds the critical value 
— Vc then phase decomposition takes place for mixing ratios in the 
interval X- < X < x+. The solid dot indicates a critical point. 

The horizontal axis is the CP to vRNA concentration ratio 
and the vertical axis is the negative of the second virial co¬ 
efficient Vi of a saturated aggregate. The solid dot indicates 
a critical point Vi = 14 that marks the onset of the phase 
decomposition Il48l . The interval of phase-decomposition 
widens as the strength of the negative second virial coefficient 
increases. 

This phase diagram closely resembles that of the phase sep¬ 
aration of a polymer solution into dense and dilute phases 
when the solvent quality changes from good to bad ll^ . 
There is however an important difference in terms of interpre¬ 
tation. If the solvent quality is reduced in a polymer solution 
then the formation of globules typically induces macroscopic 
phase separation, as is the case when condensing agents are 
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added to a solution containing ssDNA molecules lll5l [36l. 
However, macroscopic phase separation does not (and should 
not !) occur during viral assembly. The reason is that the ag¬ 
gregates remain highly charged since the CPs, acting as the 
condensing agents, are charge neutral (at least in the model). 
From this it follows that the CP-rich and CP-poor moieties 
in the two-phase region of the phase diagram remain mixed 
together in a dispersed state. Decomposition without macro¬ 
scopic phase-separation is in fact well known from the liter¬ 
ature on complexation of oppositely charged polyelectrolytes 
as disproportionation Il22ll40l and we have adopted this usage. 

The phase diagram in the strong-charging regime is shown 
in Fig. [7] The critical point has been replaced by a sharp. 



FIG. 7. (Color online) Same as Fig. |^but now in the strong-charging 
regime. 

first-order coil-to-globule transition along the saturated glob¬ 
ule line X — 1. For X less than one, this transition broadens 
out into a wedge of phase decomposition, much like any phase 
transition of a single-component material tends to broaden 
into a phase-coexistence interval when impurities are mixed 
in. Note the surprising “re-entrance”: if Vi is increased start¬ 
ing from the coil phase for X near one then disproportionation 
appears, disappears, and then reappears. 

2. Coil-to-globule transition 

As the binding affinity is reduced, the phase-diagram be¬ 
comes dependent not just on the concentration ratio but also 
on the total concentrations. This is shown in Fig|^ which dis¬ 
plays the dependence of the width of the two-phase region on 
e and the CP concentration (jtcp for the case that —Vi is larger 
than the critical value —14 (see Fig. [^. The vertical axis is 
the sum of the CP to RNA binding affinity e and with 
PlJ-CP = lii((/icp/co)- Here, (j)cp is the total CP concentra¬ 
tion and Co is the CP concentration for a densely packed array 
of capsids. For X ~ 1 the aggregate is in the condensed glob¬ 
ule state for larger /3e. When jie is reduced the solution de¬ 
composes into one moiety with aggregates whose occupancy 
X = a;+ is close to one (the globule state) and one moiety 
whose occupancy x_ close to zero (the coil state). For suffi¬ 
ciently low /3e, the system is again in a one-phase region, but 
now with most CPs in solution and with the vRNA molecules 


e+f^cp 



FIG. 8. (Color online) Phase-decomposition for fixed second virial 
coefficient Vi. The vertical axis is the sum of the CP to RNA bind¬ 
ing affinity e plus pcp where Pucp = ln(0cp/co). Here, (jicp is 
the total CP concentration and co the CP concentration for a densely 
packed array of capsids. The horizontal axis is the CP to RNA mix¬ 
ing ratio X. The boundaries of decomposition X- and x+ for large 
binding affinities are those shown in Fig. 

in the coil state. The coil-to-globule transition is smeared out 
as a function of /3/icp = ln(()>cp/co) because fj,cp is not the 
true CP chemical potential (see Supplemental Material, Sect. 
V [30]). For large e/kpT, disproportionation is determined 
only by the concentration ratio X, as we saw earlier. 


E. Surface Segregation 

As — Vi increases, the system enters deeper into the con¬ 
densed phase. A CP-RNA aggregate can no longer be treated 
as uniform when this happens. The reason is that the head 
groups will segregate out to the surface of the condensate. 
There are two reasons for this. First, condensed globules 
have a surface tension 70 ll39l (a simple mean-field argu¬ 
ment (see Supplemental Material, Sect. VI [30]) gives /370 

Head groups located in the interior effectively in¬ 
crease the globule surface area. Transferring a head group 
from the interior to the surface lowers the free energy by 
an amount 7of7^, which we estimate to be of the order of 
kBT{D/l)^. Next, because head groups transferred to the 
surface are oriented by the surface - because the tail groups 
remain bound to the RNA interior - surface segregation also 
leads to a gain in orientation-dependent attractive interaction 
between CPs. Figure shows the fraction 9 of CPs located 
on the surface as a function of the dimensionless surface 
tension for different values of the strength u of the 

orientation-dependent attractive interactions between adjacent 
CPs located on the surface. The curves were obtained using 
a Langmuir surface-adsorption model with attractive nearest- 
neighbor interaction for adsorbed particles (see Supplemen¬ 
tal Material, Sect. VIII [30] for details). The curves marked 
a — c are lines of fixed strength for the attractive interaction 
between surface-oriented CPs. Note that the curves resemble 
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FIG. 9. (Color online) The fraction of CPs segregated to the surface 
of a saturated aggregate as a function of the dimensionless surface 
tension /3'yoD'^ of the globule. The curves a-c correspond to decreas¬ 
ing attraction u between oriented head groups located on the surface. 
Curve b corresponds to the critical isotherm. 


the isotherms of the van der Waals gas. If 6 is close to one, 
then the CP layer has the character of a strongly correlated 
two-dimensional (2D) fluid while for 0 close to zero it has the 
character of a weakly correlated 2D gas. The transition be¬ 
tween these regimes can be smooth (case c) or discontinuous 
(case a). A critical point located on the b isotherm separates 
the two regimes. 

Surface-segregation leads to a geometrical conflict. Let R 
be the radius of a condensed globule with no head groups in 
the interior. If the globule surface area 47ri?^ is less than the 
area ND^ of a layer of close-packed CP head groups, then 
only a fraction of the CPs of a saturated aggregate can be ac¬ 
commodated on the surface. The surface of the vRNA globule 
of T = 3 ssRNA viruses accommodates about 180 CPs while 
a swollen CCMV vRNA molecule can accommodate about 
300 CCMV CPs. One solution to resolve the conflict is for 
the excess CPs to be expelled into the surrounding solution at 
the expense of losing an affinity e per tail group. This cor¬ 
responds to the provirion 1 state of Sect. I. If e is increased 
then breaking the bond between CPs and the vRNA molecule 
becomes too costly. Instead the surface area of the conden¬ 
sate can be increased to allow access to the surface for more 
CPs. This corresponds to the provirion 2 state of Sect. I. In 
the next section we extend the model to study the competition 
between the provirion 1 and 2 states, assuming that surface 
segregation. 


III. EXTENDED MODEL: PROVIRION STATES 


density is defined as 
/3/s(P 2) =P2ln 


P2D^ \ 
l-p2Dy 


B^p\+ 


-F 2p2^1n ~ 


(III.l) 


The first two terms of Eq. pil.l i constitute the van der Waals 
free energy density of a two-dimensional system of disk-like 
particles with area density p 2 and excluded area D^. The last 
two terms are, respectively, the CP electrostatic free energy 
in the strong-charging limit (see Supplemental Material, Sect. 
Ill B [30]) and the CP-vRNA affinity. The (negative) sec¬ 
ond virial coefficient represents the hydrophobic pairing 
attraction between surface-segregated CP headgroups. It de¬ 
pends on the angle ip = D/R between the relative orienta¬ 
tions of the two axes of adjacent CPs (see Fig. as 

= - exp[/3u - {ip- ipcY //Aip'^] (III.2) 


Here, u is - as before - the binding energy of the pairing at¬ 
traction, Alp is the angular range of the pairing attraction, and 
ipc — D/Rcis the relative angle between the CPs of a com¬ 
pleted T = 3 capsid with i?c — 10 nm the inner radius of the 
shell m. 

The surface free energy Fs = fs{p 2 )A, with A the glob¬ 
ule surface area, must be added to the interior free energy 
Fb = fb{p 3 ,x)^l, with H ~ (4/3)7ri?c^ the globule volume, 
Pa = N/Q, the interior segment density, and x the segment 
occupation probability. The interior free energy density of a 
highly condense globule also has the van der Waals form; 

/3/b(p3, x) = P3 In ^ ^ ^ 

Here, p^ is the maximum segment packing density, corre¬ 
sponding to a hydrated crystal of duplex RNA ifSOll . If we 
demand that the radius of a close-packed sphere of vRNA 
segments equals 9nm then pml^ — 0.37. The second term 
describes tail group mediated attractive interaction between 
RNA segments where x = p 2 {A/N) is the occupation proba¬ 
bility for a vRNA segment to be occupied by a tail group. For 
provirion states with p 2 D^ ~ 1, the occupancy x ~ A/ND"^ 
reduces to a dimensionless measure of the surface area. In 
the limit of small pa, the van der Waals free energy reduces 
to our earlier virial expansion with a second virial coefficient 
V{x) = {1/Pm) — ax that decreases linearly with the oc¬ 
cupation probability El. As in Section 11, the second virial 
coefficient will be assumed to change sign as a function of 
X, separating states where the aggregate is in good solvent 
(for smaller x) or in bad solvent (for larger x). As before, 
the globule is assumed to have a surface tension /3jo{x) ^ 
V{x'f' . However, the second virial coefficient V(x) for 

the surface segregated state in general will be different from 
that of the uniform globule state. 


A. CP-exchange equilibrium and surface phase diagram 


The extended model is defined by separate free energies 
for the surface and the interior. The surface free energy area 


The next step is to impose thermodynamic equilibrium of 
the globule surface and interior, both with respect to each 
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other and with respect to the surrounding solution. The head 
group surface area density p 2 is determined by the condition 
of exchange or phase equilibrium between CPs located on the 
globule surface and in the surrounding solution. For simplic¬ 
ity, we will assume in this section that the solution CP chem¬ 
ical potential p is a fixed quantity. Exchange equilibrium is 
satisfied if 


I3dfs/dp2 - ap3 = Pp (III.4) 

The second term on the left hand side is due to the fact that 
the interior free energy density, through x, also depends on the 
surface CP density. A surface phase diagram can be obtained 
in terms of the 2D surface pressure 112 = P 2 ~ fip^) 

and the strength exp —^u of the attractive interactions. The 



Here, Ha = —dF^/dfl is the three-dimensional (3D) osmotic 
pressure exerted by the interior on the surface layer. The first 
term on the left hand side can be understood by noting that 
7 = 7o — n(/92) is the thermodynamic surface tension - 
defined as 7 = dF/dA - so 2y/R can be interpreted as a 
Laplace pressure. Under conditions of thermodynamic equi¬ 
librium, the thermodynamic surface tension is related to the 
chemical potential by the Gibbs isotherm d'y = —p 2 dp with 
p again the chemical potential. 

The second term in Eq.( III.5| l is a pressure that is gener¬ 
ated by the dependence of the second virial coefficient Bp 
on angle, and hence on R. The same term would have been 
obtained if we had included a Helfrich bending energy in the 
surface energy with mean curvature 2 /R and spontaneous cur¬ 
vature 2/i?c (see also ref. 153]). Here, PKh = exp(/3u)/A'i/;^ 
acts as a dimensionless bending modulus. Equation ( |III.5| l can 
be extended to non-spherical surfaces, by replacing 2 /R with 
the mean curvature. Einally, the pressure n 3 (p 3 ) in Eq.( |III.5 1 
exerted by the interior on the surface is given by the usual van 
der Waals equation of state; 




P3 


1 - Ps/Pn 


axp3 


(III.6) 


We will only consider solutions of Eq.(III.5i with p 2 D^ ~ 1 
and ps/pm — 1- The nature of the solution depends in this 


case on which term dominates the left hand side of Eq.(III.5 1 . 


1. Provirion 1 state 


FIG. 10. (Color online) Phase diagram of the surface layer. (Solid 
line) Liquid to gas transition, ending at a critical point (CP). (Dashed 
lines) Solidification and sublimation lines ending at the triple point 
(TP, not included in the model). The thermodynamic surface tension 
vanishes along the red line. 

surface phase diagram has a line of first-order transitions sep¬ 
arating a liquid and a gas phase, as shown in Fig. end¬ 
ing at a critical point (CP) where exp pu ~ Z. In a more 
complete model, this phase diagram would also contain other 
phases with, minimally, a solidification line ending at a triple 
point (TP), where it joins a sublimation line (both marked 
as dashed lines). The horizontal red line will be discussed 
below. We will restrict ourselves to the high-density liquid 
phase with p 2 D^ ~ 1 and p^/pm — 1- The solution for 
Eq.( |III.4| i corresponding to those condition has a surface pres¬ 
sure n 2 ~ (p -F e)/that increases (approximately) linearly 
with {p -F e) II 52 I . 


B. Mechanical equilibrium 


The next step is to impose mechanical equilibrium which 
requires that the total free energy is minimized with respect to 
P 3 . This leads to 


2 ( 7 o(a^) - 112) 
R 


Rc 



= 113(^3) 


(III.5) 


First assume smaller values for e -F p. and larger values for 
u so the second term, the Helfrich pressure term, dominates 
over the Laplace pressure term. The dominant Helfrich en¬ 
ergy is minimized if the shell adopts the geometry of a sphere 
with radius equal to the spontaneous curvature radius (here 
Rc). The segment occupation probability x can be equated to 
Xy — AttR'^/ND^, the occupation probability of an assem¬ 
bled virion (for CCMV, Xy is of the order of 0.6). A signif¬ 
icant fraction of the RNA segments are not associated with a 
tail group in this state, which reflects the geometrical conflict 
we noted earler. The second virial coefficient has increased 
from U(a; = 1) to U(a; = a;^) when x is reduced from a value 
close to one to Xy. Similarly, the bare surface tension of the 
globule must be reduced, say to joixy) If U(x„) still is neg¬ 
ative, like V(x = 1), then the RNA material remains in the 
condensed state. The interior osmotic pressure Hs exerted on 
the shell can be neglected in this case. The thermodynamic 
surface tension may be positive or negative, depending on the 
sign of jo(xv) — n2(e -F p). Surfaces with a negative sur¬ 
face tension normally are thermodynamically unstable but the 
Helfrich bending energy can suppress this instability for suffi¬ 
ciently large bending energy. If, on the other hand, the second 
virial coefficient V(xy) of the interior is positive then the in¬ 
terior is in a good solvent state and exerts a positive osmotic 
pressure on the shell. The bare surface tension is zero in this 
case. The combined pressures 03 -F 2n2(e -F p)/Rc must be 
adsorbed by the Helfric bending energy. 
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2. Provirion 2 state 

For increasing /r + e the surface pressure 112 rises. When 
2112 /Re approaches Kh/R t in rnagnitude then the Helfrich 
bending energy is no longer able to compensate for the surface 
pressure. The surface area is forced to expand until the occu¬ 
pation probability reaches its maximum value x = 1. The in¬ 
terior remains condensed since a; = 1 so the interior pressure 
113 can be set to zero. Because the surface area A ~ ND^ 
now exceeds 47r(i?c)^ the surface cannot remain spherical. 
By analogy with similar problems in the physics of surfactants 
ED, we expect that the mean curvature will remain close to 
2 /i?c over sections of the surface that are bounded by lines of 
negative Gauss curvature where that is not the case. 

IV. SUMMARY AND CONCLUSION 

In this concluding section, we hrst compare a number of 
predictions of the model with the outcome of recent experi¬ 
ments on self-assembly of the CCMV virus. We then discuss 
predictions of the model that have not yet been tested and con¬ 
clude with the most important limitations of the model. 

A. Comparison with experiment 

The most distinctive prediction of the model in terms of 
experimental tests concerns the optimal mixing ratio (OMR), 
dehned as the minimum value of the CP-to-vRNA concentra¬ 
tion ratio X for which all of the vRNA molecules are pack¬ 
aged. The OMR has been measured through virion assem¬ 
bly experiments in solutions that contained CCMV CPs and 
non-CCMV vRNA molecules, using an assembly protocol 
aimed at maintaining thermodynamic equilibrium Q. The 
non-native vRNA molecules had the same length as that of 
CCMV vRNA molecules. Solutions with a prescribed mixing 
ratio were hrst incubated at neutral pH and low salinity, so 
with weak CP-CP pairing attraction. Cryo-EM images of the 
solution revealed the formation of virion-sized complexes of 
CP and RNA with irregular and disordered shapes. RNase di¬ 
gestion assays showed that these disordered complexes did not 
protect the RNA from degradation by RNase so the complexes 
could not be stable virions. CP-RNA binding was reversible 
and CPs could exchange between different RNAs ll54ll . When 
CP-CP interactions were strengthened by lowering of the pH 
from 7.2 to 4.5, true virions formed from these structures. 

The results of electrophoresis runs for different mixing ra¬ 
tios X 1541 are shown in FigjTT] The far left column shows the 
case of native CCMV. The far right column shows the case of 
solutions containing only RNA molecules, which thus move 
faster than native CCMV virions during gel electrophoresis. 
For the case of CP to RNA weight ratios w below 6.0, a narrow 
band moves with a velocity slightly less than that of CCMV 
virions. The weight rato w is related to the mixing ratio X of 
the previous sections by w ~ 6.0X so w ~ 6.0 corresponds 
to AT ~ 1.0. This suggests that aggregates in this band at least 
resemble the CCMV virion. These labile aggregates could not 


> 
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FIG. 11. CP-RNA assembly titrations: gel retardation assays. 
Shown are 1% agarose gels run at low pH and stained for RNA. At 
the left is a titration of 3217nt RNAl molecules of the Brome Mo¬ 
saic Virus (BMV) with varying amounts of CCMV CP. The value of 
the CP to RNA weight ratio w is provided at the top of each lane. It 
ranges from 0 (right-most lane, RNA) to 6:1 (lane second from left). 
The weight ratio w is related to the mixing ratio X of the text by 
w ~ 6.0V. The left-most lane shows the position of CCMV virions. 
From ref.@ 

correspond to the provirion 2 state, since the provirion 2 state 
is larger than the CCMV virion state, and are candidates for 
the provirion 1 state. 

The faster RNA band has broadened out extending from 
velocities higher than that of the pure RNA molecule, down 
to the CCMV-like band. Aggregates in this smeared out band 
are not packaged when the interaction strength is increased, so 
X is less than the OMR. If the CP concentration is increased 
then the broad band disappears around a concentration ratio of 
about 300 CPs per vRNA molecule. For a positive tail charge 
of about -FlOe, this corresponds to an OMR of V = 1. 

This is a striking result. If one would apply textbook self- 
assembly theory ED then - by directly minimizing the free 
energy of a solution of CPs and vRNA molecules in the ab¬ 
sence of any CP-induced vRNA condensation - one obtains 
Fig[^(see Supplemental Material, Sect. IV [30]). As a func- 



FIG. 12. (Color online) Dependence of the concentration 0/ of free 
capsid proteins and that of the capsid concentration C{M) on the 
total protein concentration 4>cp (left vertical axis) according to text¬ 
book theory. Capsid assembly starts at the CMC (f>* and terminates 
when the RNA supply has been exhausted at 4>cp — M <))rna with 
M the number of CPs per virion. 
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tion of increasing CP concentration, capsid assembly starts at 
a CMC, denoted by (jf, which is proportional to the Boltz¬ 
mann factor for inserting a CP into a virion shell. Beyond </>*, 
the concentration of free CPs saturates while that of asssem- 
bled capsids increases linearly with the CP concentration. 
The increase stops when the supply of vRNA molecules is 
exhausted, which is the OMR at which (nearly) all vRNA 
molecules have been packaged. The OMR is thus X = M/N, 
with M the number of CPs of a T = 3 shell and N the num¬ 
ber of vRNA segments (see Supplemental Material, Sect. VII 
[30] for a more detailed discussion). 


The value of the OMR predicted by the model follows from 
Figs. 6-8. These show that the solution should dispropor¬ 
tionate into CP-rich globules and CP-poor vRNA molecules. 
Provided the CP-rich globules are in the provirion 1 state, the 
globules should transform into virions when the strength u of 
the CP-CP pairing is increased and the 2D liquid freezes into 
a T = 3 “crystal”. The CP-poor swollen vRNA molecules 
will not be packaged. It follows that, for the model, the OMR 
is AT = 1. More generally, the OMR corresponds to charge 
neutralization of the vRNA molecule by the CP tail groups. 
Measurement of the OMR is thus a direct way compare the 
theory proposed in this paper for virion assembly and text¬ 
book self-assembly theory. 


If one interprets the smeared-out band in Fig{^ as be¬ 
ing produced by CP-poor swollen vRNA globules then there 
would seem to be agreeement between the predictions of the 
proposed model and experiment and a direct violation of con¬ 
ventional self-assembly theory. It should be recalled here 
that conventional self-assembly theory works quite well for 
the assembly of empty capsids. This interpretation can (and 
should) be questioned. In a solution that is in complete ther¬ 
modynamic equilibrium, each vRNA molecule should fluctu¬ 
ate thermally between all of the allowed configurations. In an 
electrophoresis experiment that is carried out on a system in 
full equilibrium, the vRNA molecules should all move with 
an average speed determined by a Boltzmann average over 
all accessible states, leading to just one single band. Fig¬ 
ure 11 indicates that the previous experiments were carried 


out on time scales shorter than the thermal equilibration time. 
Now, it seems reasonable to assume that the life-time of an 
assembled provirion state is the longest relaxation time of the 
system. In an electrophoresis experiment carried out on time 
scales shorter than this relaxation time but longer than any 
other relaxation time, one should expect a bimodal distribu¬ 
tion with two narrow bands. The slow band contains proviri¬ 
ons and the fast band corresponds to aggregates that intercon¬ 
vert among each other on the time scale of the experiment. 
A recent assembly study of CCMV with much shorter 500-nt- 
long RNA fragments reported bimodal distributions for nearly 
all CP;RNA ratios ll55l . The natural interpretation is that for 
the case of shorter RNA chains the system is closer to thermal 
equilibrium. Recently, Kler and co-workers found - for the 
S V40 virus - that titration of a short RNA molecule (less than 
0.8 kb) with VPl indeed gave a bimodal distribution while 
binding of VPl to longer RNAs again led to the formation 
of intermediate species Il5^ . Bimodal distributions have also 
been observed in the in-vitro assembly of cucumber mosaic 


virus (CMV) ll57ll . If the size of the vRNA molecule is in¬ 
creased, then provirions that are missing variable amounts of 
CPs are expected to have relatively long life times. These 
would show as a smearing of the slow band. 

A second distinctive prediction of the model concerns the 
claim that the CP-tail groups must be effective vRNA con¬ 
densing agents. If the compression of the vRNA molecule was 
purely due to the action of the CP head groups - which we ar¬ 
gued against in the introduction - then removal of the CP head 
groups should cause swelling of the vRNA molecule, while 
the model predicts increased condensation. In ref. ||58] l59ll 
it was shown that the vRNA molecules of the T = 1 Satel¬ 
lite Tobcacco Mosaic Virus (STMV) remained in a fully con¬ 
densed state after the CP head groups had been enzymatically 
removed from STMV virions while the tail groups of the CPs 
remained behind. The resulting particles were thermodynam¬ 
ically very stable. X-ray diffraction studies revealed a tight 
association between the tail groups and the vRNA molecule 

El ED. 

Next, in the weak-charging limit, vRNA molecules in good 
solvent should have a fractal structure with a radius of gyra¬ 
tion R{N) that scales with the number of monomers as 
while in the strong-charging limit, the molecules should be 
more linear and extended, and the radius of gyration should 
be proportional to N. In the condensed state, the radius of 
gyration should scale as N^/^. In either case, the MLD of 
the vRNA molecule should depend on the radius as R^/^, 
which can be tested experimentally. Gopal and coworkers 
visualized CCMV RNA 2 (of 2.7 kb) molecules using cryo- 
electron microscopy ll60ll . They found that, in a physiological 
buffer without Mg^"*", the RNA molecules adopted highly ex¬ 
tended structures with just a few major branches. An example 
is shown in the left panel of Fig [T^ The appearance of ex- 



FIG. 13. (Color online) Cryo-EM images of 2777-nt RNA molecule 
under assembly conditions (left panel) and assembly conditions with 
added ions. The image is reproduced from ref. (60) with per¬ 

mission 

tended vRNA structures suggests the strong-charging regime 
where the vRNA molecules are effectively stretched by elec¬ 
trostatic repulsion. When the solution concentration of Mg"'”'’ 
ions was increased, more compact, spherical shapes appeared 
with a smaller radius comparable to that of the virus itself as 
shown in Fig.[T^ right panel. These molecules had structures 
that were significantly more branched than the swollen struc¬ 
tures in good solvent. These results seem at least consistent 
with the simple model of Sect. II. 
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Has the provirion state been observed microscopically? 
Figure [T4| shows cryo-EM images of 3,200-nt RNA molecules 
when CPs are added to the solution. Figure (a) shows the 



FIG. 14. (Color online) (Top) Cryo-EM images of 3,200-nt RNA 
during the different stages of assembly. Bottom: reconstructions, 
(a) Shows naked RNA in assembly buffer with (see also Fig. 

[H- (b) Shows the same RNA molecule but decorated with a super- 
stoichiometric amount of CP at higher pH. Note that the complex is 
smaller than the naked RNA molecule. Analysis of a large number 
of cryo-EM images shows that the average size drops from about 
37 nm when the RNA is naked to about 32 nm in the provirion state 
when it is decorated by CP. (c) Shows the formation of capsid-like 
structures when CP-CP interactions are strengthened by reducing the 
pH. Reproduced from ref. (H with permission from Elsevier. 


RNA molecule in assembly buffer with added in the ab¬ 

sence of CPs, as in the previous picture. Note again the elon¬ 
gated arms, indicative of strong electrostatic repulsion. Figure 
[T4| (b) shows the same RNA molecule when the CP to RNA 
mixing ratio X is larger than 0.6. The images were taken 
at higher pH when the attractive interactions are too weak to 
support virion assembly. The approximated structure of the 
RNA molecule is shown in the bottom image. It clearly has 
undergone a certain degree of additional condensation. The 
image shows - probably transient - shell fragments. If this 
would be the image of a provirion 1 then the description of 
the surface-segregated CPs as a 2D correlated fluid will have 
to be replaced by a more complex fluctuating state with a sta¬ 
tistical distribution over capsid fragments of various size. Fig¬ 
ure 14 (c) shows the structure of the aggregate when the pH is 


reduced so the strength of the CP-CP attraction increased. If 
this is an image of a provirion 1 then it would correspond more 
closely to the description proposed in this paper, although it 
could also be already a true virion. A key point would be 
to determine at what pH the shell transforms from a fluid state 
that is in thermodynamic equilibrium with the surrounding so¬ 
lution to an ordered T = 3 capsid with frozen-in CP positions. 


B. Tests of the model 

We now turn to the predictions of the model that allow 
future experimental tests of its validity. The existence of a 
provirion 2 state plays a central role in this respect. Accord¬ 


ing to the general phase diagram (see Fig. 3), increasing the 
binding affinity e should lead to a stabilization of provirion 
2 state with respect to the provirion 1 state. Increasing e 
could be done by systematically increasing the number of pos¬ 
itively charged arginine residues on the CP tail groups, as was 
already done in ref. ED. Electron microscopy images of a 
provirion 2 state should be characterized by strongly fluctuat¬ 
ing, non-spherical shapes. Increasing the strength of the CP- 
CP attraction starting from a provirion 2 state should produce 
not virions but malformed structures, as in Fig. 

Next, reducing e - for example by reducing the number 
of charged residues per tail group - would allow a second 
experimental test of the proposed model. According to the 
assembly diagram, for smaller values of Pe, assembly as a 
function of increasing u should proceed without vRNA con¬ 
densation. According to Fig[T^ the OMR should then be 
X — M/L ~ 0.6 (for the case of CCMV at least). Moreover, 
the fraction of assembled virions measured, as a function of 
the total CP concentration, should now obey the Faw of Mass 
Action, as is the case for empty capsids but as is not the case in 
the model if assembly starts from a provirion precursor state. 

Another important prediction of the model concerns the 
presence of either a critical point or a first-order phase tran¬ 
sition point in the assembly diagrams (see Figs. 8 and 9), 
depending on the charging parameter a. This could be tested 
by repeating the electrophoresis experiments discused above 
but now decreasing the magnitude of the negative second 
virial coefficient —Vi. The second virial coeffcient of the 
CPs could be quantitatively measured separately by thermo¬ 
dynamic studies of CP pair formation in dilute solutions of 
CPs. Variation of Vi as a function of pH, salinity, or tail 
length could then be determined. Measurement of the dis¬ 
proportionation interval - in terms of the mixing ratio X - 
by gel electrophoresis for different values of Vi could verify 
whether the weak or strong-charging regime applied. Recall 
here the striking prediction of reentrance of the single-phase 
region in the assembly diagram as a function of —Vi for the 
strong-charging case. 


The experiments discussed above could be repeated for dif¬ 
ferent salt concentrations. Reducing the salinity means in¬ 
creasing the strength of the electrostatic interactions. Studies 
of the assembly of empty CCMV capsids lITl l25l l26ll reported 
that at higher CP concentration and lower ionic strength, mul¬ 
tishell structures form, stabilized by electrostatic interactions 
E 2 II . where the tail groups of the second layer associate with 
the head groups of the CP first layer. It has been shown that 
multi-layer shell structures form during assembly of virions 
with shorter RNA molecules, as shown in Fig. 15 B5l . If 
the gel-electrophoresis experiments discussed above were re¬ 
peated at lower salinity, then the excess CPs released in so¬ 
lution would now be expected to remain associated with the 
CP shell in the form of a second layer (see Figs.4 and 5). By 
measuring the number of CPs that remain associated with the 
virion after assembly - for example by fluorescent labeling - it 
could be checked if the number of excess CPs equals the dif¬ 
ference between the number of CPs of a saturated aggregate 
and of a virion. 
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FIG. 15. (Color online) Cryo-EM images of assembly products of 
500-nt RNA. Multishell structures formed with an inner shell that 
roughly correspond to a T = 2 shell and an outer shell that corre¬ 
sponds to an incomplete T = 3 shell. Reprinted with permission 
from (55) Copyright 2014 American Chemical Society. 


C. Overcharging 

In the introduction we posed the question why the total 
positive charge of the CP tail groups of the CPs is not neu¬ 
tralized by the negative charge of the RNA molecule. Recall 
that overcharging in the context of viral assembly previously 
has been attributed to the local structure of tail group/RNA 
association in ref. Cl and to Manning condensation in 
ref. 03 - In the proposed model, overcharging is attributed 
to correlations between the CP head groups: The homoge¬ 
neous saturated aggregate, in which the tail groups do neutral¬ 
ize the RNA charge, is “frustrated” by the non-electrostatic, 
angle-dependent interactions that drive the formation of the 
provirion 1 state and the T = 3 capsid. 

Could experiment resolve questions about the cause of the 
overcharging? It is possible to enzymatically digest the head 
groups of a virion ll63l . If the proposed explanations of either 
Refs, ini or d are right, then the macroion overcharge of 
the remaining RNA/tail-group core particle should remain sta¬ 
ble in a solution that contains a modest concentration of tail- 
group molecules, as it represents a minimum free energy state. 
On the other hand, in the model proposed here, the core par¬ 
ticle should be expected to “soak-up” extra tail-groups from 
the surrounding solution causing the overcharge to decrease 
to zero. Changes of the charge of a core particle could be 
monitored in an electrophoresis experiment. 

This method could also be used to measure the osmotic 
pressure the RNA/tail group assembly exerts on the capsid. 
If this pressure is negative, then the RNA/tail-group core par¬ 
ticle ought to contract after enzymatic digestion of the capsid 
while in the opposite case, it should expand. The radius of the 
core particles in solution could be measured by AFM, as was 
already done 1631, or in a scattering experiment. By adjust¬ 
ing the osmotic pressure of the surrounding solution untill the 
radius of the core particle would equal the inner radius Rc of 
the capsid, one could establish the osmotic pressure inside the 
virion. In the proposed model, the core particle should expand 
after digestion of the capsid. 


D. Limitations of the model 

We hnish by discussing the limitations of the proposed 
model. The applicability of the proposed model to the com¬ 
plexity of viral assembly involves both straightforward simpli¬ 
fications that could impair quantitative predictions but which 
can be improved upon in a systematic fashion, and more “dan¬ 
gerous” assumptions whose failure would compromise the 
usefulness of the model at a fundamental level. 

Straightforward simplifications involve equating the magni¬ 
tudes of the head group and tail group charges, assuming a lin¬ 
ear dependence of the second virial coefficient on occupancy, 
assuming a Gaussian dependence of the surface second virial 
coefficient on angle and assuming that the third virial coeffi¬ 
cient of a CP/vRNA saturated aggregate does not depend on 
occupancy. Higher-order terms may have to be systematically 
included in the virial expansion. In order to carry out quanti¬ 
tative tests, these assumptions may have to be improved upon. 
However, we believe - although have not explicitly demon¬ 
strated - that none of the key predictions discussed earlier will 
be affected if the model is generalized. 

A more serious limitation of the model concerns the use of 
mean-field theory. We made the assumption that the interior 
of a surface-segregated globule is homogeneous. In actuality, 
because CP tail-groups are attached to the surface-segregated 
CP head-groups, neutralization of the negative vRNA charges 
must be more efficient near the surface of the globule than in 
the interior. As a result, the macro-ion charge density will 
have a radial profile. Mean-field theories allowing spatial 
variation of the density can be formulated but it would seri¬ 
ously complicate the formalism. Fluctuations around mean- 
field theory, even one that includes a non-trivial density pro¬ 
file, are neglected as well. As discussed in the conclusion, if 
the surface of a provirion Is better described as a collection of 
transient shell fragments instead of a correlated but uniform 
fluid then this would be a serious concern for the theory that 
would not be easy to remedy. 

Another “dangerous” limitation of the model is the use of 
PB theory to describe the electrostatics. It can be shown that 
the condensation of ds DNA molecules by polyvalent counte¬ 
rions is due to correlation attraction. This effect is beyond PB 
theory ll34l and it is possible - even likely - that the same is 
true for the condensation of ss RNA molecules. This problem 
was to some extent “swept under the rug” by including corre¬ 
lation attraction effects as negative contributions to the effec¬ 
tive second virial coefficient. The applicability of PB theory 
can be monitored by measuring the OMR. Serious breakdown 
of PB theory would be signaled by the appearance of over¬ 
charging ll64l of saturated aggregates since these would no 
longer correspond to a state in which the tail groups neutral¬ 
ize the RNA molecules. That would mean that X = 1 would 
not correspond to the OMR. For CCMV at least, that appears 
not to be the case but it could well be true for other viruses. 

A final important limitation of the model is the restriction to 
equilibrium thermodynamics. The assembly of empty capsids 
follows the law of mass action of equilibrium statistical me¬ 
chanics. In actuality, empty capsids in fact do not disassem¬ 
ble when the CP concentration in solution is reduced back to 
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zero. The final assembly step or steps are quasi-irreversible. 
This must be true as well for virions or else virions would 
disassemble in CP free solutions, which is not the case. Ki¬ 
netic models of empty capsid assembly confirm that a form 
of the law of mass action survives when only a few number 
of steps are irreversible ll65]l . In general, equilibrium models 
are expected to fail progressively as the number of irreversible 
assembly steps increases. It is our belief however that an un¬ 
derstanding viral assembly in general requires understanding 
viral assembly under conditions of thermal equilibrium. 
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SUPPLEMENTARY MATERIAL 

V. SECONDARY STRUCTURE OE VIRAL RNA 
MOLECULES 


The secondary structure of single-stranded RNA molecules 
is determined by the complementary pairing of nucleotides. 
Figure shows a typical minimum free energy secondary 
structure of a T=3 viral RNA molecule obtained by applying 
the Mfold program ll66l . The top figure is that of a vRNA 


A phage QB ssRNA genome, 4215 nt 



PIG. 16. Secondary structures of a vRNA genome molecule (A) 
and of an RNA molecule with a random sequence of nucleotides 
(B) with the same length. The longest end-to-end lengths S of the 
molecules, known as the maximum ladder distance, are indicated by 
yellow lines. Reprinted with permission from Ref. m (Copyright 
(2008) National Academy of Sciences, U.S.A.). 


molecule of about 4,000 nucleotides. The bottom hgure is 
that of an RNA molecule that has the same number of nu¬ 
cleotides but with a randomly chosen nucleotide sequence. 
Both molecules are composed of short complementary paired 
sequences that alternate with unpaired bubbles and branch 
points. The size of the molecule is determined by the max¬ 
imum ladder distance S, dehned as the longest distance be¬ 
tween ends of the branched structure, counting only paired nu¬ 
cleotides 1661. The hgure indicates that viral RNA molecules 
tend to have shorter maximum ladder distances than ran¬ 
dom sequences. The secondary structure of a large ssRNA 
molecule is not hxed. For a vRNA molecule, thousands of 
alternative secondary structures may be found with free ener¬ 
gies that differ by less than ksT from the minimum free en¬ 
ergy state. The structures shown in Fig. 16 thus should only be 
viewed as representative of a large family of secondary struc¬ 
tures with nearly the same free energy. The entropy associated 
with this quasi-degeneracy plays an important role in the the¬ 
ory. 


VI. SECOND VIRIAL COEFFICIENT Vi 

The second virial coefficient Vi of saturated aggregates 
plays the role of an effective temperature in the model. Here, 
we estimate different contributions to Vi. We estimate the 
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electrostatic contribution 14 to Vi as the second virial coeffi¬ 
cient of a saline solution containing spherical particles of di¬ 
ameter D and charge Q. For particle radii larger than the De- 
bye screening length, so for kD ^ 1, 14 = ^ 3 ^ in the DH 
approximation in a monovalent salt solution 16^ . For CCMV 
the head-group charge Qe decreases from about 12 e to about 
lOe as the pH is decreases from pH ~ 7, when empty capsids 
do not form, towards the isoelectric point around pH ~ 5 , 
when empty capsids do readily assemble Col. For D equal 
to 3 nm, decreases from about 30 nm^ to about 20 nm^. 
The next term, —14, is due to the attractive interactions be¬ 
tween CPs. The main contribution is hydrophobic attraction, 
which does not depend on pH. Since in the absence of RNA, 
empty capsids do not assemble at neutral pH but do assemble 
as the pH is reduced, the magnitude I 4 of the attractive inter¬ 
action is estimated to be between 20 —30nm^. Finally, when 
RNA is present, the RNA molecule can mediate attractive in¬ 
teractions by bridging (see FigfT7| with a contribution — Vrto 
the second virial coefficient. For CCMV, the tail charge is not 
significantly dependent on the pH, in which case Vt is not ex¬ 
pected to depend on the pH, though it should depend on the 
ionic strength. 


14 ~ K - 14 - l^T (VI.I) 


A. Charged Spheres 


As a simple model for the interior of the virion, assume 
a uniformly charged sphere of charge Qe and radius i? in a 
solution of monovalent salt ions with concentration Cg. The 
sphere is permeable to the salt ions. The charging parameter 
is defined as a(i?) = with V{R) = (4/3)7ri?^. The 

PB electrostatic free energy of the charged sphere equals: 




(VII. 1) 


The electrical potentials inside and outside the sphere are 
nearly constant but at the surface of the sphere there is a po¬ 
tential dros by an amount A$given by: 

^ = - In (a + x/l + a" ) , (VII.2) 

which is a version of the Donnan Potential. The electro¬ 
static free energy in the weak-charging regime a{R) <C 1 
is quadratic in a{R): 



FIG. 17. Attraction between capsid proteins can be mediated by the 
association of positively charged tail charges with negatively charged 
RNA segments. 

The sum of the first two terms is positive for neutral pH. In 
order for Vi to be negative under reduced pH, when CCMV 
virions form, but not for neutral pH when stable CCMV viri¬ 
ons do not form, Vr should be in the range of a few times 
nm^. Given this complexity, instead of attempting to calcu¬ 
late Vi it may be more practical to directly measure Vi through 
the equation of state of a mixture of CPs and nanometer length 
ssRNA segments with a concentration ratio X = 1. 


VII. AQUEOUS ELECTROSTATICS 

In this section we discuss Poisson-Boltzmann (PB) theory 
as applied to the proposed model (see, e.g., ll^ Chapter 3], 
Il69l Chapter 10]. 


~ o(/i)« 1 (VIL3) 

V Cs 

In the strong charging regime a{R) ^ 1, it adopts the form 
FpB{kBTa) ^2alna a(i?) > 1 (VII.4) 

V Cs 

This expression can be viewed as the entropic free energy of 
the counterions of the macromolecule. In the strong charg¬ 
ing regime, the Donnan potential is of the order of . The 
electrostatic free energy difference of a monovalent counte¬ 
rion inside and outside the sphere is thus comparable to the 
thermal energy in the strong charging regime. 


B. Charged Shells 


The PB electrostatic free energy per unit area of a charged 
shell of radius R with surface charge density ea equals (see, 
e.g., 168] Chapter 3]) 


FpB 

keTA 


= 2a 


Ak 


(VII.5) 

in the large R limit. Here, A = l/(7rtT/B) is the Gouy- 
Chapman (GC) length and k is again the inverse of the De¬ 
bye screening length. In the weak-charging regime Ak ^ 1 , 
formula (VII. 5|l simplifies to 


^PB ^ 

ksTA K 


(VII.6) 
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while in the strong charging limit, Xk <C 1, 


FpB 

keTA 


~ 2cr 


In --1 

Xk 


(VII.7) 


For smaller R, curvature effects do contribute to the surface 
electrostatic free energy ifTOl but this can be neglected in our 
case. 


The coefficient “two” in front of Eq. (VII.7 1 may seem 
strange on first glance: QhP is the 2D concentration of coun¬ 
terions, so one could have expected just an ideal gas of coun¬ 
terions contributing a surface osmotic pressure 11 = ksTQhP 
(or lateral pressure). In fact, counterions are localized around 
the surface within layer whose thickness is the GC length, 
A = which depends on the surface area through the 

surface charge density a. The ideal gas of counterions is con¬ 
fined to a volume AA cx it is this that produces the 
factor of two upon differentiation with respect to area when 
the osmotic pressure is computed. Physically, it means that 
the lateral pressure of counterions is enhanced by the fact that 
the counterions are attracted to the surface. 


C. Weak and strong charging regimes of confined 
polyelectrolyte molecules. 


The uniform charge model is not quite adequate in the 
weak-charging regime to describe a confined polyelectrolyte 
molecule. Let I be the persistence length of a highly charged 
polyelectrolyte molecules with total length L = Nl and total 
charge Qe. Assume that molecule is confined inside a sphere 
of radius R. The charging parameter is then a{R) = 

In the strong charging regime a ^ 1 the counterions are dis¬ 
tributed relatively uniformly over the globule (see Fig. 18 i. 


Weak Charging 


Strong Charging 


[«<!) 


(a> 1) 


e AV < knT 



eAV« kflT 



FIG. 18. Weak and strong charging regimes. 

The electrostatic free energy of the molecule in the strong 
charging regime a{R) ^ 1 can be obtained by applying Eq. 

( |VII^ : 


The electrostatic free energy of the polyelectrolyte molecule 
in the weak-charging regime a{R) <C 1 can not be obtained 
from the charged sphere expression because the counterions 
of the polyelectrolyte molecule are no longer uniformly dis¬ 
tributed but confined to a tube surrounding the polyelectrolyte 
molecule with a radius that is of the order of the screening 
length (see Fig. [T8] l. The DH expression for the electrostatic 
free energy of a polyelectrolyte molecule was obtained by On- 
sager IItTII and it has the form 


FpB 



a{R) <C 1 


(VII.9) 


The actual validity condition of the Onsager/DH result is that 
the volume Lk~^ of the Debye screening cloud surrounding 
the N segments has to be less than the volume of the sphere, 
but this reduces to a{R) ^ 1. 


VIII. ASSEMBLY OF VIRIONS: LAW OF MASS ACTION 


In this section we appy the Law of Mass Action (LMA) to 
virion assembly. Recall that the LMA is known to be a good 
description for the assembly of empty capsids. The starting 
point is the free energy of a dilute solution of proteins and 
RNA molecules expressed as: 


F 




N , 

+ H ([C'(P)] In 

p—1 ^ 



E{p)[Cip)]^ 


(VIII. 1) 


The first term is the solution entropic free energy of free CPs, 
with (j)f the concentration of free CPs and cq ~ 1/D^ the 
dense-packing concentration of virions. The first part of the 
second term is the solution entropic free energy of aggregates 
composed of one vRNA molecule with a varying number of 
proteins. [C'(p)] is the concentration of vRNA molecules as¬ 
sociated with p CP molecules. N is the maximum number 
of CPs per aggregate. In last term, E{p) is the association 
energy of a p-protein aggregate. This free energy is to be min¬ 
imized subject to the conservation laws for the RNA and the 
CP molecules: 


N 

0RNA = [RNA]/ + y^[C(p)] 

N 

(t>CP = '/'/ + 


(VIII.2) 


Here, ^^rna is the total RNA concentration, [RNA]/ the con¬ 
centration of free RNA molecules not associated with CPs 
while (pcp is the total concentration of CP molecules. Min¬ 
imization with respect to C{p) subject to these constraints 
leads to 


FpB 



a{R) » 1 (VII.8) 


[C{p)] = [RNA]/ exp {E{p)/kBT) , 


(VIII.3) 
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which can be viewed as expressing the LMA or as expressing 
the Boltzmann distribution in the Grand Canonical Ensemble. 
The concentration (j>f of free CPs must obey the condition 

4>CP = 4>f + ^RNA (p) (VIII.4) 


Here, (p) is the expectation value of the number of CPs 
per RNA molecule obtained from the probability distribution 

[C{p)]/4>cp- 

First consider the simplest case: a solution that contains 
only free CPs, free vRNA molecules and assembled virions. 
The self-consistency condition Eq. (|VIII.4[) then reads: 


</'CP =</*/ + -^'/'RNA 


4>f exp {Ey/ksT) 

I + (i)f explEy/ksT)) 


(VIII.5) 


where Ey is the association energy of a capsid and where all 
concentrations with a bar have been made dimensionless by 
dividing by cq. 

The solution of this equation is shown in Fig. 12 of the 
main text. For 0/ less than (p* = exp{—{Ey/M)/kBT) 
the second term on right-hand side of the equation can be ne¬ 
glected so ~ ^cp- the CPs are nearly all free in solu¬ 
tion, as are the RNA molecules (the uniform phase of Fig.3). 
When pcp approaches p* = exp {—{Ey/M)/kBT) the sec¬ 
ond term starts to rise very sharply. For larger pcp, (pf re¬ 
mains pegged at cp* while the virion concentration grows lin¬ 
early as C{M) ~ (^cp ~ (p*)/M. This stops when cpcp 
exceeds p* + Mcp^wA where nearly all RNA molecules have 
been encapsidated. The concentration of free CPs then starts 
to grow again as pcp is increased beyond this point. The op¬ 
timal concentration ratio between the vRNA and CP concen¬ 
tration is thus the stoichiometric ratio of the virion. 

The CP concentration cpcp is the essential thermodynamic 
variable that regulates assembly in the FMA description. The 
boundary for virion assembly as a function of thermodynamic 
parameters is determined by the condition ^cp — fp* or 


Ey = -MkBT In (Pcp (VIII.6) 


which corresponds to equating the CP chemical potential in 
solution with that of a CP that is part of a virion. 

To arrive at an explicit expression for the transition line in 
the e — u plane we need an expression for the association 
energy Ey of an assembled virion composed of one vRNA 
molecule and M CPs. In a naive model, Ey{e,u) could be 
written as the sum of three parts: 

Ey{e, u)=(^e+ ^zyu^ M - AFy (VIII.7) 

In the first term, e is again the association free energy between 
a CP tail and a vRNA molecule while Zy is the mean number 
of neighbors of a CP incorporated in the ordered capsid of a 
virion (between 5 and 6), while u is the CP-CP pairing en¬ 
ergy. The last term, AFy, is the change in free energy of the 
vRNA molecule before and after encapsidation, not counting 
the CP/RNA association energy. It includes the free energy 
associated with RNA condensation - excluding the RNA/CP 


binding energy - and it may be negative or positive but it is 
not expected to depend on e and u. Under this ansatz, the 
transition line for the onset of virion assembly is a straight 
line in the e — u plane ll72]| . 

One can include in the FMA description assembly interme¬ 
diates in the form of partially assembled shells. The energy 
cost of the edge of a partial shell is of the order of u times the 
number of CPs that constitute the edge of a partial shell. The 
maximum value of the edge length is of the order of It 

follows from the fact that is large compared to 

when u is of the order of a few that, under conditions 
of thermodynamic equilibrium, the concentration of assem¬ 
bly intermediates is negligible as compared to that of free CPs 
and assembled virions. This argument is the same as the rea¬ 
son why the concentration of partial shells is negligible during 
the assembly of empty capsids ca. 


IX. CHEMICAL POTENTIAL AND 
DISPROPORTIONATION 


If e/^sT is not large compared to one then the mean seg¬ 
ment occupancy probability (x) must be less than X since 
some CPs now will remain free in solution, say at a concen¬ 
tration of (pf. The condition of phase equilibrium between 
these free CPs and those that are part of an aggregate is that 
they must have the same CP chemical potential /i. The con¬ 
centration (pf of free CPs is given by 

</>/= <^cp (l-(IX.l) 

so/r = kBT\n{(pcp (l - /cq). 

Equating ^ to the chemical potential inside an aggregate 
gives: 

(IX.2) 


Here, F{x) is the value of the Flory variational free energy 
F{R, S, x), see Eq.II.l, after minimization with respect to R 
and S for fixed x. The argument of F{x) - the occupancy 
of a particular aggregate - must be distinguished here from 
(x), its average over all aggregates. Though (x) necessarily is 
less than X, individual aggregates can have an occupancy that 
exceeds X. 

It is convenient to replace p with the CP chemical poten¬ 
tial PCP = k bT\p{(PCP/Cq) of CPs in the absence of vRNA 
molecules. The latter quantity, but not the former, is under 
experimental control (it is roughly in the range of —6 for the 
for the in-vitro experiments discussed in the conclusion. In 
terms of /rcp, the condition of phase equilibrium for finite £ 
translates to 


{x) . 

- = l-exp 



Mcp + e 


1 

N dx J 


(IX.3) 


We now can use Eq |IX.3| to follow the disproportionation pro¬ 
cess as a function of the mixing ratio X and e -I- pcp using 
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FIG. 19. Disproportionation of a solution of RNA and capsid pro¬ 
teins. The vertical axis (x) is the probability that an RNA segment 
is associated with a capsid protein (CP). The horizontal axis is the 
chemical potential ficp of the CPs in the absence of RNA pluse the 
tail-RNA binding energy. The two solid lines are curves with the 
same fixed second virial coefficient Vi < 14 but different macro¬ 
scopic mixing ratios X, with X = 1 a saturated aggregate. If (x) 
is in the interval between x\ and X 2 there is disproportionation into 
CP poor and CP rich aggregates, as shown by the vertical blue line. 
Note that the coexistence line is smeared. If X is reduced from X\ 
to X 2 < X 2 , then the solution remains disproportionated for large 
values of e -F jic p ■ 


the common tangent method. The result shown in Fig. 19 


The two solid lines are loci with the same the second virial 
coefficient Vi fixed at a negative value below the critical point 
with a corresponding disproportionation interval is [a;i,a; 2 ]. 
For the top curve, the macroscopic mixing ratio Xi is in the 
interval [x 2 , 1]. For /tcp + e ^ ksT, {x) approaches Xi.If 
ftCP + e is reduced, then so does the mean occupancy (x). 
Disproportionation starts when (x) drops below X 2 - The sys¬ 
tem disproportionates between xi and X 2 , according to the 
usual tie rule (see Fig. [T9] ). Note the kink in the dependence 
of (x) on POP + £ at (x) = X 2 and (x) = x^. For small 
MCP + £. when all globules have converted to coils, the mean 
occupancy eventually goes to zero. If the mixing ratio X is 
reduced to a value X 2 < X 2 then the system remains in the 
two-phase region for large (pcp +£)■ The physical reason for 
the “smearing” of the transition, as compared to conventional 
phase separation, is that the experimentally accessible chemi¬ 
cal potential (pcp + £) is not the actual chemical potential. 


X. SURFACE TENSION 


An aspect of the globule state that is not covered by Flory 
theory concerns the surface tension of the vRNA globule, de¬ 
noted by 7 o. The globule surface tension is related to the 
so-called “blob size” - the correlation length - by the scal¬ 
ing relation 70 « ksT/^g^ (see refs. Il39l Section 20] and 
Gil Section 3.3.2]). We estimate 70 as the characteristic con¬ 
densation energy per segment ksTV'^/W divided by the area 
per segment For the previously estimated values of V, 


W, and I used, 70 would be in the range of erg/cm^ (about 
an order of magnitude less than that of a typical liquid). Be¬ 
cause of this surface tension, polymer globules tend to ag¬ 
gregate when they come in contact. A fused aggregate glob¬ 
ule composed of two polymers has a lower surface area than 
two separate globules. A coil-to-globule transition is likely to 
induce macroscopic phase separation of a solution of vRNA 
molecules when vRNA globule would fuse under the action 
of surface tension. Under physiological conditions, vRNA 
molecules do not aggregate in this fashion. The natural inter¬ 
pretation is that the second virial co- efficient V is positive for 
vRNA molecules under physiological conditions. Saturated 
aggregates and assembled virions should not aggregate either. 
This is indeed prevented by the fact that the total charge of 
CPs is small so CP-RNA aggregates remain charged. Electro¬ 
static repulsion can then prevent aggregation. 


XI. OPTIMAL MIXING RATIO 


In this section we compute the optimal mixing ratio, or 
OMR, using the Law of Mass Action (LMA) for solutions 
in thermal equilibrium that contain free CPs, free RNA 
molecules, virions, and saturated aggregates. Minimization 
of the solution free energy of Section IV gives 


[?^>cp] = [^cp]/+ 


+ [</'rna] 


(XU) 


All concentrations are here dimensionless. The quantity in¬ 
side the large brackets is the expectation value of the num¬ 
ber of CPs associated with an RNA molecule in the grand- 
canonical ensemble, with fcsTin[())]/ the chemical potential 
of the CPs. Ey and Epy are the assembly free energies of an M 
CP virion, respectively, an N CP provirion. Consider that the 
macroscopic CP to RNA mixing ratio X = X[CP]/[RNA] 
is less than one. The stoichiometric mixing ratio that cor¬ 
responds to virion assembly is X* = M/N. The OMR is 
the value of X that maximizes the fraction E{X) of RNA 
molecules that is part of a virion where 


EiX) = (^^[<^cp]/)^ 

(1 + + {Zpv[fcp]f)^) 

where notations Zy = exp (Ey/ksTM) is the Boltzmann fac¬ 
tor per CP in the virion state and Zpy = exp (Epy/NksT) the 
CP Boltzmann factor in the provirion state. The transition 
from a provirion to a virion state as a function of the CP-CP 
binding energy u requires that the CP/RNA binding energy e 
is large compared to ksT, which means that both and Zpy 
are large compared to one. The conservation law for CPs then 
reduces to 


(M/N){zpy[fcp]f)'^ + {zy[fcp]f)^ 

1 + Zy[(j)CP]f^ + {Zpv[4>CP]f)^ 
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which simplifies to 

X ~ (X* - X)([0cp]/^y)“ + (1 - X)([,^cp]/2p.)'^ 

(XI.4) 

where we introduced the notation X* = M/N for the stoi¬ 
chiometric mixing ratio. 

For mixing ratios X below the stoichiometric ratio X*, 
both terms on the right hand side of Eq. ( |XI.4[ ) are positive. If 
both [4>cp]fZv and [(j)cp]fZpv are less than one then the equa¬ 
tion has no solution since the left hand side is of the order 
of one while both terms on the right hand side are small com¬ 
pared to one (assuming that both M and N are large compared 
to one). If Zpy exceeds Zp then, with increasing [())cp], the fac¬ 
tor [4>cp]fZpv will reach one first at which point the first term 
on the right hand side is of the order of one. Since [(j>cp]fZv 
will be less than one, the second term is small compared to 
one when raised to the power M. The equation can be solved 
and the concentration of free CPs is 

(xi.5) 


first. The positive second provirion term exceeds the nega¬ 
tive virion term and the same result ensues as before. How¬ 
ever, in the opposite case that Zy exceeds no solution ap¬ 
pears when {[(t>Cp]fZpv) reaches one because the virion term 
is negative. Instead, [(/fcp must be increased further until the 
provirion term cancels the virion. This happens when 

(X - X*){[ct>cp]fzv)^ X){[cl)cp]fZpy)^ (XI.8) 

The solution contains a mixture of virions and provirions 
and only few free RNAs or CPs. From this condition, the 
fraction of encapsidated RNA molecules follows directly 

E{X) = 

The function E{X) has a cusp maximum at X* = M/N, the 
stoichiometric ratio for the assembly of a virion. We conclude 
that according to the LMA, X* is the optimal mixing ratio 
for virion assembly under conditions of full thermodynamic 
equilibrium and binding energies large compared to the ther¬ 
mal energy. 


Note that the right hand side is close to one unless X is close 
to zero or one. There are practically no virions in this case. If 
is less than Zp, then the roles of virions and provirions are 


^pv 


exchanged. The solution of Eq. (XI.4 1 now leads to 


/ X \ 1/^^ 

(XI.6) 

Inserting this in to the expression for the fraction of encapsi¬ 
dated RNA molecules gives 


E{X) ~X/X* 


X < X* 


(XI.7) 


Now, there are practically no provirions. The provirion to 
virion transition line is thus given by Zpy = Zp or hy Ey/M = 
Epy/N . This means that, at the transition point, the assembly 
energy per CP must be the same for virions and provirions. 
One way to understand this result is by noting that it is equiv¬ 
alent to demanding that the chemical potential of a CP that 
is part of a virion must be the same as one that is part of a 
provirion state at the transition point. 

Next consider the case that the mixing ratio exceeds the 
stoichiometric ratio (so X > X*). The excess CPs compete 
with the CPs that are part of a virion for excess to RNA. Now 


the first term of Eq. (XI.4i is negative. If Zpy exceeds Zy then 
with increasing [(jicp the factor {[(j>cp]fZpy) again reaches one 


XII. SURFACE SEGREGATION 


In this section we discuss surface segregation. Assume that 
CPs on the surface of the globule are described by the modi¬ 
fied van der Waals system of the proposed model: 


f{P2,ip)/kBT =p2 In 


P2D^ \ 
1-P2D^) 


+ Bpl . 


+ 2p2<31n 


^P2QIb ^ 


(XII. 1) 


The surface energy density is plotted in Fig. 20 as a function 
of the area density for increasingly negative values of B. The 
surface CPs are in chemical equilibrium with the CPs in the 
interior of the globule, which have a chemical potential p ~ 
joD^. For \B\ small compared to B* « —kBTD^Q, the 
repulsive electrostatic interactions dominate. The condition 
for phase equilibrium with the bulk produces a relatively low 
surface density (indicated by the small solid dots in panels a - 
d). The solid dots in the main figure, which all indicate states 
with low surface density, form an almost continuous string. 
As B becomes more negative, attractive interactions generate 
a negative curvature for intermediate surface densities. There 
is a specific value B*, of the order of a few times QD^, where 
the interior coexists with both the low density surface density 
state and the high surface density state. The solid tangent line 
coincides with the dashed common tangent. For larger values 
of \B\, only this state remains. 
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FIG. 20. Bottom panel: surface free energy as a function of CP 
surface density p for increasingly negative values of B. 
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TABLE I. Table of Latin symbols 


A 

surface area 


second virial coefficient 

B* 

value second virial coefficient 
at critical point. 

b 

spacing between charges 

Co 

1/virion volume 

Cs 

salt concentration 

Cp 

solution concentration of 
aggregates containing p proteins 


excluded CP area 
on capsid surface 

Ey 

cohesion energy of a virion 

F 

solution free energy 

Ff{R) 

Flory free energy of a vRNA molecule 

FpB 

PB Electrostatic free energy 

F{R,x) 

Variational free energy of aggregate 

Fo 

surface free energy 

fip2) 

surface free energy area density 

1/5 

average # segments between 
branch points 

Kh 

Helfrich bending energy of a CP shell 

1 

length of one segment 

II 

01 

33 L 

Bjerrum length 

M 

# CPs per T=3 virion (180) 

N 

maximum # CPs per RNA molecule 

n 

number of CPs whose tail groups are 
in contact with the vRNA molecule 

Q* 

Effective macroion charge. 

Qt > 0 

charge of a CP tail group in 
units of elementary charge e 

-Qh < 0 

charge of a CP head group in 
units of elementary charge e 

Q — Qt — Qh 

approximated value 
of the tail and head group charges 

[RNA]f 

solution concentration of 

CP-free vRNA molecules 

R 

radius of gyration 
of a vRNA molecule 

Rs 

mean radius of curvature 
of the shell 

Rc 

radius of a T=3 capsid 

Rg 

equilibrium radius of gyration 
of a vRNA molecule 

S 

spanning distance of branched 

RNA in units of segments 

u 

affinity between CP head groups 

V 

second virial c-t of RNA segments 

Va 

contribution to second virial c-t of 
aggregates due to CP-CP attraction 

14 

contribution to second virial c-t of 
aggregates due to CP-CP 
electrostatic repulsion 

Kff(a:) 

second virial c-t of 
an aggregate 

Vt 

second virial c-t 
of RNA segments due to 

CP tail-induced attraction 

Vi 

second virial c-t of a saturated aggregate 

w 

third virial c-t of RNA segments 

Wesix) 

third virial c-t of 
an aggregate 

X 

concentration ratio of CPs and 
vRNA segments (macroscopic) 

X 

ratio of the number of CPs and 
vRNA segments of an 
aggregate (microscopic) 
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TABLE II. Table of Greek symbols 


a= 

IcsV 

charging parameter 

a » 1 

strongly charged regime 

a <C 1 

weakly charged regime 

70 

Surface tension of a 


vRNA globule in poor solvent 

AF^ 

change in free energy of the vRNA 


molecule upon assembly 

A'lp 

angular width of the 


directional CP-CP pair potential. 

e 

affinity between RNA segments 


and CP tail groups 

£o 

dielectric constant water 

K 

inverse Debye screening length 

\ — 1 
^ 2TT<ylB 

Gouy-Chapman length 

P 

Chemical potential CPs 


in the presence of vRNA molecules 

jJ^CP 

Chemical potential CPs 


in the absence of vRNA molecules 

Cs 

Correlation length or blob size of a 


vRNA molecule in poor solvent 

C = Is/h 

Manning parameter, fraction 
of RNA charges not compensated 


by condensed counterions 


Two dimensional surface pressure 

P3 

vRNA segment density 

p2 

number density of CPs 


on globule surface 

a 

surface charge density, 
in units of elementary charge e 

4>cp 

solution concentration of 


capsid proteins (CPs) 

4>cp 

dimensionless solution concentration of 


capsid proteins (CPs) 

(pf 

solution concentration of 


unbound capsid proteins (CPs) 

fpf 

dimensionless solution concentration of 


unbound capsid proteins (CPs) 

0RNA 

solution concentration of 


vRNA molecules 

0RNA 

dimensionless solution concentration of 


vRNA molecules 


critical protein concentration 


for virion assembly 


relative angle between the normals 


of adjacent capsid proteins 


relative angle between the normals 
of adjacent capsid proteins of the virion 

n{R) 

volume of a sphere of radius R 




